Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1949827: Add KUBELET_NODEIP_HINT to nodeip-configuration #2888

Merged
merged 1 commit into from
Mar 31, 2022

Conversation

cybertron
Copy link
Member

@cybertron cybertron commented Dec 21, 2021

When we added the nodeip-configuration service for None platform
deployments, we broke some existing users who were relying on the
(largely undefined) previous behavior Kubelet used to select its
node ip. While it is possible to work around this by overriding the
node ip selection logic, that's very cumbersome and not an acceptable
user experience.

This change adds a KUBELET_NODEIP_HINT env variable that can be used
to override the default behavior of runtimecfg when selecting a node
ip. When the variable is unset, the old behavior of selecting an
address on the interface of the default route will take effect. When
the variable is set, its value will be passed to runtimecfg like a
VIP for the IPI platforms. This will cause runtimecfg to prefer an
address in the same subnet as the one provided in
KUBELET_NODEIP_HINT. If no such address is found, it will fall back
to the default route logic as before.

KUBELET_NODEIP_HINT can be set using a systemd environment file.
The file must be named /etc/default/nodeip-configuration
with contents such as (replacing the IP as appropriate):

KUBELET_NODEIP_HINT=192.0.2.1

This file should be created using a machine-config manifest that is
passed to the installer so it will take effect on initial deployment.
The node ip cannot be changed after the node registers initially so
this cannot be done as a day 2 operation.

Note that the IP specified in the hint does not necessarily need to
exist in the environment, it just needs to be in the correct subnet.
No traffic will be sent to this address.

- What I did

- How to verify it

- Description for the changelog

Added a mechanism for deployers to override the default node ip selection logic in None platform deployments.

@openshift-ci openshift-ci bot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Dec 21, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 21, 2021

@cybertron: This pull request references Bugzilla bug 1949827, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla ([email protected]), skipping review request.

In response to this:

Bug 1949827: Add KUBELET_NODEIP_HINT to nodeip-configuration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cybertron
Copy link
Member Author

/cc @danwinship

This is essentially what you suggested as a workaround on the bz. I went with a more generic env var name in case some of these deployments did not want the nodes listening on the same subnet as the api server (although...is that even valid?). I also used the built-in override mechanism in systemd to specify the value which kept the changes to the service file minimal.

I semi-successfully tested this in an IPI deployment by hacking up the nodeip-configuration for IPI to do the same thing. It seemed to work, although I couldn't complete deployment that way because my bootstrap node wasn't connected to the additional network I specified so there was no connectivity with the nodes. They did select the correct address to bind to though.

@openshift-ci openshift-ci bot requested a review from danwinship December 21, 2021 22:27
@cybertron
Copy link
Member Author

/retest-required
/test e2e-vsphere-upi

@kikisdeliveryservice
Copy link
Contributor

/retest-required

@danwinship
Copy link
Contributor

in case some of these deployments did not want the nodes listening on the same subnet as the api server (although...is that even valid?)

I think so, though it would be weird...

I also used the built-in override mechanism in systemd to specify the value which kept the changes to the service file minimal.

I used EnvironmentFile in my example because it's a lot easier for the user that way. The MachineConfig just needs to say:

source: data:,KUBELET_NODEIP_HINT=192.0.2.1

rather than:

source: data:base64,W1NlcnZpY2VdCkVudmlyb25tZW50PSJLVUJFTEVUX05PREVJUF9ISU5UPTE5Mi4wLjIuMSIK

@danwinship
Copy link
Contributor

/lgtm
anyway

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 5, 2022
@cybertron
Copy link
Member Author

I also used the built-in override mechanism in systemd to specify the value which kept the changes to the service file minimal.

I used EnvironmentFile in my example because it's a lot easier for the user that way. The MachineConfig just needs to say:

source: data:,KUBELET_NODEIP_HINT=192.0.2.1

rather than:

source: data:base64,W1NlcnZpY2VdCkVudmlyb25tZW50PSJLVUJFTEVUX05PREVJUF9ISU5UPTE5Mi4wLjIuMSIK

Hmm, yeah. I guess that must have been a holdover from when I tried to implement this only in baremetal-runtimecfg (which didn't work because podman doesn't pass env vars through by default, so service changes were needed anyway). Let me try switching this to EnvironmentFile since there's no reason to do it this way now.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2022

@cybertron: This pull request references Bugzilla bug 1949827, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla ([email protected]), skipping review request.

In response to this:

Bug 1949827: Add KUBELET_NODEIP_HINT to nodeip-configuration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cybertron
Copy link
Member Author

Okay, I switched to EnvironmentFile. I had to add a default one that sets the variable to empty or the service fails when you don't create the file. Otherwise this is pretty much the same as before.

@cybertron
Copy link
Member Author

Okay, tested this version locally and it works. I think it should be good to go.

@danwinship
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2022
@kikisdeliveryservice
Copy link
Contributor

@cybertron can you make sure to get the ocp docs updated to include this (if it doesn't already)?

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For such a small change would you mind squashing the commits into one - esp since some of them aren't relevant to final implementation?

When we added the nodeip-configuration service for None platform
deployments, we broke some existing users who were relying on the
(largely undefined) previous behavior Kubelet used to select its
node ip. While it is possible to work around this by overriding the
node ip selection logic, that's very cumbersome and not an acceptable
user experience.

This change adds a KUBELET_NODEIP_HINT env variable that can be used
to override the default behavior of runtimecfg when selecting a node
ip. When the variable is unset, the old behavior of selecting an
address on the interface of the default route will take effect. When
the variable is set, its value will be passed to runtimecfg like a
VIP for the IPI platforms. This will cause runtimecfg to prefer an
address in the same subnet as the one provided in
KUBELET_NODEIP_HINT. If no such address is found, it will fall back
to the default route logic as before.

KUBELET_NODEIP_HINT can be set using a systemd environment file.
The file must be named /etc/default/nodeip-configuration
with contents such as (replacing the IP as appropriate):

KUBELET_NODEIP_HINT=192.0.2.1

This file should be created using a machine-config manifest that is
passed to the installer so it will take effect on initial deployment.
The node ip cannot be changed after the node registers initially so
this cannot be done as a day 2 operation.

Note that the IP specified in the hint does not necessarily need to
exist in the environment, it just needs to be in the correct subnet.
No traffic will be sent to this address.

Co-authored-by: Dan Winship <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2022
@cybertron
Copy link
Member Author

Squashed the changes and I'll talk to the docs folks about where to put this.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

The requirements for Bugzilla bugs have changed (BZs linked to PRs on master branch need to target OCP 4.11), recalculating validity.

@openshift-ci openshift-ci bot removed the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 28, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

13 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2022

@cybertron: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-upgrade-single-node a0c9a3c link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-workers-rhel7 a0c9a3c link false /test e2e-aws-workers-rhel7
ci/prow/e2e-aws-disruptive a0c9a3c link false /test e2e-aws-disruptive
ci/prow/e2e-aws-workers-rhel8 a0c9a3c link false /test e2e-aws-workers-rhel8
ci/prow/e2e-aws-single-node a0c9a3c link false /test e2e-aws-single-node
ci/prow/4.12-upgrade-from-stable-4.11-images a0c9a3c link true /test 4.12-upgrade-from-stable-4.11-images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 5a571cd into openshift:master Mar 31, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2022

@cybertron: All pull requests linked via external trackers have merged:

Bugzilla bug 1949827 has been moved to the MODIFIED state.

In response to this:

Bug 1949827: Add KUBELET_NODEIP_HINT to nodeip-configuration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cybertron
Copy link
Member Author

/cherry-pick release-4.10

@openshift-cherrypick-robot

@cybertron: new pull request created: #3058

In response to this:

/cherry-pick release-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cybertron added a commit to cybertron/enhancements that referenced this pull request Jul 8, 2022
As OpenShift is deployed in increasingly complex networking environments, we
have gotten many requests for more control over which interface is used for
the primary node IP. We provided a basic mechanism for this with
[KUBELET_NODEIP_HINT](openshift/machine-config-operator#2888)
but as users have started to exercise that some significant limitations have
come to light.
cybertron added a commit to cybertron/enhancements that referenced this pull request Jul 21, 2022
As OpenShift is deployed in increasingly complex networking environments, we
have gotten many requests for more control over which interface is used for
the primary node IP. We provided a basic mechanism for this with
[KUBELET_NODEIP_HINT](openshift/machine-config-operator#2888)
but as users have started to exercise that some significant limitations have
come to light.
cybertron added a commit to cybertron/enhancements that referenced this pull request Jul 29, 2022
As OpenShift is deployed in increasingly complex networking environments, we
have gotten many requests for more control over which interface is used for
the primary node IP. We provided a basic mechanism for this with
[KUBELET_NODEIP_HINT](openshift/machine-config-operator#2888)
but as users have started to exercise that some significant limitations have
come to light.
cybertron added a commit to cybertron/enhancements that referenced this pull request Jul 29, 2022
As OpenShift is deployed in increasingly complex networking environments, we
have gotten many requests for more control over which interface is used for
the primary node IP. We provided a basic mechanism for this with
[KUBELET_NODEIP_HINT](openshift/machine-config-operator#2888)
but as users have started to exercise that some significant limitations have
come to light.
cybertron added a commit to cybertron/enhancements that referenced this pull request Sep 1, 2022
As OpenShift is deployed in increasingly complex networking environments, we
have gotten many requests for more control over which interface is used for
the primary node IP. We provided a basic mechanism for this with
[KUBELET_NODEIP_HINT](openshift/machine-config-operator#2888)
but as users have started to exercise that some significant limitations have
come to light.
cybertron added a commit to cybertron/enhancements that referenced this pull request Sep 1, 2022
As OpenShift is deployed in increasingly complex networking environments, we
have gotten many requests for more control over which interface is used for
the primary node IP. We provided a basic mechanism for this with
[KUBELET_NODEIP_HINT](openshift/machine-config-operator#2888)
but as users have started to exercise that some significant limitations have
come to light.
cybertron added a commit to cybertron/enhancements that referenced this pull request Sep 6, 2022
As OpenShift is deployed in increasingly complex networking environments, we
have gotten many requests for more control over which interface is used for
the primary node IP. We provided a basic mechanism for this with
[KUBELET_NODEIP_HINT](openshift/machine-config-operator#2888)
but as users have started to exercise that some significant limitations have
come to light.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants